a = Image(filename='images/linear-regression_8_1.png')
b = Image(filename='images/linear-regression_8_2.png')
c = Image(filename='images/linear-regression_8_3.png')
d = Image(filename='images/linear-regression_8_4.png')
e = Image(filename='images/linear-regression_8_5.png')
f = Image(filename='images/linear-regression_8_6.png')
display(a,b,c,d,e,f)
| Metric | Score |
|---|---|
| R2 | 0.6260 |
| RMSE | 47.5497 |
| MAE | 35.5567 |
| DS | 53.2189% |
a = Image(filename='images/linear-regression_13_2.png')
b = Image(filename='images/linear-regression_13_3.png')
display(a,b)

a = Image(filename='images/bayesian-ridge-regression-lob_19_1.png')
b = Image(filename='images/bayesian-ridge-regression-lob_20_1.png')
display(a,b)
| Metric | Score |
|---|---|
| R2 | 0.0111 |
| RMSE | 0.4493 |
| MAE | 0.2315 |
| DS | 56.47% |
a = Image(filename='images/bayesian-ridge-regression-lob_16_2.png')
b = Image(filename='images/bayesian-ridge-regression-lob_16_3.png')
display(a,b)
| Metric | Score |
|---|---|
| R2 | 0.1184 |
| RMSE | 0.4393 |
| MAE | 0.3103 |
| DS | 59.16% |
a = Image(filename='images/bayesian-ridge-regression-trades_14_2.png')
b = Image(filename='images/bayesian-ridge-regression-trades_14_3.png')
display(a,b)
| Metric | Score |
|---|---|
| R2 | -0.0051 |
| RMSE | 0.4482 |
| MAE | 0.2274 |
| DS | 55.53% |
a = Image(filename='images/support-vector-regression-lob_17_3.png')
b = Image(filename='images/support-vector-regression-lob_17_4.png')
display(a,b)
| Metric | Score |
|---|---|
| R2 | 0.1064 |
| RMSE | 0.4414 |
| MAE | 0.3042 |
| DS | 60.14% |
a = Image(filename='images/support-vector-regression-trades_15_3.png')
b = Image(filename='images/support-vector-regression-trades_15_4.png')
display(a,b)
a = Image(filename='images/support-vector-regression-trades_18_2.png')
b = Image(filename='images/support-vector-regression-trades_18_3.png')
display(a,b)
Given a vector of inputs $X^T = (x_1, x_2, ..., x_n)$, the model attempts to predict the hypothesis $h(x)$ denoted as follows:
$$h(x) = \beta_0 + \sum_{i = 1}^n \beta_ix_i$$where $\beta_0$ is the bias or intercept and the vector $\beta = (\beta_1, \beta_2, ..., \beta_n)$ is the vector of feature weights learned by the model through a cost function, which we wish to minimise. A commonly used one is the residual sum-of-squares:
$$RSS(\beta) = \sum_{i = 1}^m(y_i - X_i^T\beta)^2$$where m is the number of training data, $y_i$ is the output and $X_i^T\beta$ is the closed form of our hypothesis by setting $x_0$ to 1.
Given a set of competing hypotheses which explain a data set, then, for each hypothesis:
Select the most probable hypothesis
The frequentist formulation of the ridge regression introduces the L2 norm as a penalty to the standard residual sum-of-squares cost function: $$PRSS(\beta)_{L2} = \sum_{i = 1}^m(y_i - X_i^T\beta)^2 + \lambda \sum_{j = 1}^n \beta^2$$ where $\lambda$ is the regularisation parameter. In the Bayesian context, using an L2 penalty is equivalent to setting a Gaussian prior on the weights: $$\beta \sim \mathcal{N}(0, \lambda^{-1} \mathbf{I_p})$$
Image(filename='images/sgd.png')
Image(filename='images/ga.png')
Image(filename='images/netfeats.png')
Image(filename='images/techinds.png')